The Tyranny of Latest
We've been bitten by this over and over. Something that worked yesterday
no longer works today. You hear the cries, "I didn't change anything,
it just broke!™". Well, something changed and when it comes to runtime
environments, my instinct is to look for changes caused by a reusable
reference such as stable
, latest
or even omitted versions.
To avoid unexpected pain, production systems should always be built from explicit versions.
Implicit Versions are Convenient, Initially
I'm going to pick on Docker image versioning and the latest
tag here, but this
issue also applies to systems such as pip
that allow the user to omit explicit
versions. The Docker tag named latest
is a reference that points to a particular
Docker image. Typically this will be the newest version of the image
but that is not enforced, it's convention.
Regardless of where latest
points its implicit nature is the issue. An alias like
latest
is a big convenience when you are initially building applications.
Typically you don't know (or care) what version of an image you want and it makes
sense to allow the image maintainer to choose for you. However, when you move to
a production environment this is needlessly risky.
An implicit version like latest
can change without warning when your runtime
environment is launched. The convenience you received when building the
application is now a detriment.
Things Break in Sneaky Ways
Let's rewind to the original, common cry from teams that are impacted by the "I didn't change anything, it just broke™" tyranny. I am sympathetic to this because things did just stop working, but obviously, something changed. Now the fun part begins: tracking down the actual issue.
If you suspect a Docker image changed due to an update to the latest
reference, you will need to find the output of the last successful launch of
the container and compare it with the failed one. Since we can no longer rely on
the image tags for comparison, we will need to compare the image's
manifest hash. This is a bit tricky and I suggest you
read this first.
Once you understand what you are looking for, compare a successful launch's manifest hash to the hash of the broken build. If they are different you may have found your smoking gun. Now re-launch with the old hash and if things are working you understand exactly why unstable tags are evil.
How Specific is Specific?
In the case of Docker images, explicit versions can be a challenge. I'm advocating
against using the latest
tag, but then which tag should
be used? Answering this takes some thought and inspection as image tags have no
strict requirements. Let's use an example to demonstrate the process.
Let's use a Linux distribution as an example.
Ubuntu's Docker image is a perfect place to
start. Ubuntu arranges its tags with a few levels of hierarchy. At the top
are latest
, rolling
, devel
, etc. These tags are convenient but not stable
and will change frequently. The next, more specific tag is the distro name,
such as focal
or number such as 21.04
. This tag may be more stable but is
still going to change as updates are rolled out. The most specific tag
offered by Ubuntu is a release with a distro name and timestamp, such as
focal-20210723
. Using this tag will ensure the underlying Ubuntu system is stable
until the tag is changed by the consumer. This is the right amount of specificity
for production systems.
The point is, that you will have to do a bit of detective work to find the stable tag
to use in your application. Do not assume that a tag like 21.04
will remain stable
even though it seems specific.
This Seems Like a Lot of Work
It is. It can be tedious, but so is tracking down the "I didn't change anything, it just broke™" issue. The upside is we have modern conveniences such as Helm, Ansible and a thousand other tools to reduce the burden.
The reality is, if you aren't using stable versions you will have an outage related to the underlying environment mutating unexpectedly. So get into the habit of setting versions explicitly, you will sleep better at night.